Goto

Collaborating Authors

 eigen value


Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

arXiv.org Artificial Intelligence

Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.


Dimensionality Reduction: Principal Component Analysis

#artificialintelligence

A dataset is made up of a number of features. As long as these features are related in someway to the target and are optimal in number a machine learning model will be able to produce decent results after learning from the data. But if the number of features are high and most of the features do not contribute towards the model's learning then the performance of the model will go down and the time taken to output predictions also increases. The process of reducing the number of dimensions by transforming the original feature space into a subspace is one method of performing dimensionality reduction and Principal Component Analysis (PCA) does this. So let's take a look into the building concepts of PCA.


Computer Vision and Deep Learning -Part 4

#artificialintelligence

FAST will not perform well where detection of multiple features has to be performed in same region of an image. For this Non-Maximum Suppression is used. In Non-Maximum Suppression a score function is computed, V for all the detected feature points. In a nut shell, FAST is faster than many existing feature detectors but performs poorly in presence of high level of noise. Mainly because the pixel values will be altered because of high-level of noise. Opencv documentation mentions two feature matching methods.


PCA on HyperSpectral Data

#artificialintelligence

The Hyperspectral data expands the capability of Image Classification. The Hyperspectral Data not only distinguishes different land cover types but it also provides the detailed characteristics of each land cover such as minerals, soil, man-made structures (buildings, roads, etc.) and vegetation types. While dealing with the HyperSpectral data one disadvantage is that there are too many bands to process. Apart from that, it is a challenge to store such a large amount of data. With a large amount of data, the time complexity also increases.


Consistency and Regression with Laplacian regularization in Reproducing Kernel Hilbert Space

arXiv.org Machine Learning

This note explained a way to look at reproducing kernel Hilbert space for regression problems. It consists in expressing kernel regresssion solutions with simple integral operators algebra, which we can approximate consistently from empirical data, providing the corresponding estimators of the solutions. Let's consider the classical regression problem arg min โ€–f(x) yโ€– In practice we are going to restrict the search for a solution f F, over a simpler function space f H. Let's associate to it the canonical RKHS, see Aronszajn (1950) H It is good to find function f from X to R, but what if Y is a real Hilbert space. Indeed, it is natural to extend the theory of RKHS to vector valued functions Schwartz (1964). Once again we can build an Hilbert space of functions from X to Y, let's first define ฮณ Those are going to be the building element of H Definition 1 (The RKHS H).


Risks and Caution on applying PCA for Supervised Learning Problems

#artificialintelligence

The curse of dimensionality is a very crucial problem while dealing with real-life datasets which are generally higher-dimensional data. As the dimensionality of the feature space increases, the number of configurations can grow exponentially, and thus the number of configurations covered by an observation decreases. In such a scenario, Principal Component Analysis plays a major part in efficiently reducing the dimensionality of the data yet retaining as much as possible of the variation present in the data set. Let us give a very brief introduction to Principal Component Analysis before delving into the actual problem. The central idea of Principal Component Analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of correlated variables, while retaining the maximum possible variation present in the data set.


Fast Approximate Multi-output Gaussian Processes

arXiv.org Machine Learning

Gaussian processes regression models are an appealing machine learning method as they learn expressive non-linear models from exemplar data with minimal parameter tuning and estimate both the mean and covariance of unseen points. However, exponential computational complexity growth with the number of training samples has been a long standing challenge. During training, one has to compute and invert an $N \times N$ kernel matrix at every iteration. Regression requires computation of an $m \times N$ kernel where $N$ and $m$ are the number of training and test points respectively. In this work we show how approximating the covariance kernel using eigenvalues and functions leads to an approximate Gaussian process with significant reduction in training and regression complexity. Training with the proposed approach requires computing only a $N \times n$ eigenfunction matrix and a $n \times n$ inverse where $n$ is a selected number of eigenvalues. Furthermore, regression now only requires an $m \times n$ matrix. Finally, in a special case the hyperparameter optimization is completely independent form the number of training samples. The proposed method can regress over multiple outputs, estimate the derivative of the regressor of any order, and learn the correlations between them. The computational complexity reduction, regression capabilities, and multioutput correlation learning are demonstrated in simulation examples.


Comparative Study of Machine Learning Models and BERT on SQuAD

arXiv.org Machine Learning

This study aims to provide a comparative analysis of performance of certain models popular in machine learning and the BERT model on the Stanford Question Answering Dataset (SQuAD). The analysis shows that the BERT model, which was once state-of-the-art on SQuAD, gives higher accuracy in comparison to other models. However, BERT requires a greater execution time even when only 100 samples are used. This shows that with increasing accuracy more amount of time is invested in training the data. Whereas in case of preliminary machine learning models, execution time for full data is lower but accuracy is compromised.


Graph Spectral Feature Learning for Mixed Data of Categorical and Numerical Type

arXiv.org Machine Learning

Feature learning in the presence of a mixed type of variables, numerical and categorical types, is an important issue for related modeling problems. For simple neighborhood queries under mixed data space, standard practice is to consider numerical and categorical variables separately and combining them based on some suitable distance functions. Alternatives, such as Kernel learning or Principal Component do not explicitly consider the inter-dependence structure among the mixed type of variables. In this work, we propose a novel strategy to explicitly model the probabilistic dependence structure among the mixed type of variables by an undirected graph. Spectral decomposition of the graph Laplacian provides the desired feature transformation. The Eigen spectrum of the transformed feature space shows increased separability and more prominent clusterability among the observations. The main novelty of our paper lies in capturing interactions of the mixed feature type in an unsupervised framework using a graphical model. We numerically validate the implications of the feature learning strategy


Learning Conserved Networks from Flows

arXiv.org Machine Learning

The network reconstruction problem is one of the challenging problems in network science. This work deals with reconstructing networks in which the flows are conserved around the nodes. These networks are referred to as conserved networks. We propose a novel concept of conservation graph for describing conserved networks. The properties of conservation graph are investigated. We develop a methodology to reconstruct conserved networks from flows by combining these graph properties with learning techniques, with polynomial time complexity. We show that exact network reconstruction is possible for radial networks. Further, we extend the methodology for reconstructing networks from noisy data. We demonstrate the proposed methods on different types of radial networks.